<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    
    <title>3.2.1. 二阶特征交叉 &#8212; FunRec 推荐系统 0.0.1 documentation</title>

    <link rel="stylesheet" href="../../_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/sphinx_materialdesign_theme.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fontawesome/all.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fonts.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/basic.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/d2l.css" />
    <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
    <script src="../../_static/jquery.js"></script>
    <script src="../../_static/underscore.js"></script>
    <script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="../../_static/doctools.js"></script>
    <script src="../../_static/sphinx_highlight.js"></script>
    <script src="../../_static/d2l.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="../../genindex.html" />
    <link rel="search" title="Search" href="../../search.html" />
    <link rel="next" title="3.2.2. 高阶特征交叉" href="2.higher_order.html" />
    <link rel="prev" title="3.2. 特征交叉" href="index.html" /> 
  </head>
<body>
    <div class="mdl-layout mdl-js-layout mdl-layout--fixed-header mdl-layout--fixed-drawer"><header class="mdl-layout__header mdl-layout__header--waterfall ">
    <div class="mdl-layout__header-row">
        
        <nav class="mdl-navigation breadcrumb">
            <a class="mdl-navigation__link" href="../index.html"><span class="section-number">3. </span>精排模型</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link" href="index.html"><span class="section-number">3.2. </span>特征交叉</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link is-active"><span class="section-number">3.2.1. </span>二阶特征交叉</a>
        </nav>
        <div class="mdl-layout-spacer"></div>
        <nav class="mdl-navigation">
        
<form class="form-inline pull-sm-right" action="../../search.html" method="get">
      <div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right">
        <label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon"  for="waterfall-exp">
          <i class="material-icons">search</i>
        </label>
        <div class="mdl-textfield__expandable-holder">
          <input class="mdl-textfield__input" type="text" name="q"  id="waterfall-exp" placeholder="Search" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </div>
      </div>
      <div class="mdl-tooltip" data-mdl-for="quick-search-icon">
      Quick search
      </div>
</form>
        
<a id="button-show-source"
    class="mdl-button mdl-js-button mdl-button--icon"
    href="../../_sources/chapter_2_ranking/2.feature_crossing/1.second_order.rst.txt" rel="nofollow">
  <i class="material-icons">code</i>
</a>
<div class="mdl-tooltip" data-mdl-for="button-show-source">
Show Source
</div>
        </nav>
    </div>
    <div class="mdl-layout__header-row header-links">
      <div class="mdl-layout-spacer"></div>
      <nav class="mdl-navigation">
          
              <a  class="mdl-navigation__link" href="https://funrec-notebooks.s3.eu-west-3.amazonaws.com/fun-rec.zip">
                  <i class="fas fa-download"></i>
                  Jupyter 记事本
              </a>
          
              <a  class="mdl-navigation__link" href="https://github.com/datawhalechina/fun-rec">
                  <i class="fab fa-github"></i>
                  GitHub
              </a>
      </nav>
    </div>
</header><header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">3. 精排模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">3.2. 特征交叉</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>
        <main class="mdl-layout__content" tabIndex="0">

	<script type="text/javascript" src="../../_static/sphinx_materialdesign_theme.js "></script>
    <header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">3. 精排模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">3.2. 特征交叉</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>

    <div class="document">
        <div class="page-content" role="main">
        
  <section id="second-order-feature-crossing">
<span id="id1"></span><h1><span class="section-number">3.2.1. </span>二阶特征交叉<a class="headerlink" href="#second-order-feature-crossing" title="Permalink to this heading">¶</a></h1>
<p>Wide &amp; Deep
模型虽然不错，但手工设计特征交叉实在太麻烦了。能不能让机器自己学会特征之间的关系呢？这就是我们要解决的问题。最直接的想法是让模型自动捕捉所有特征对之间的交互关系。但这里有个大问题：推荐系统的特征动辄成千上万，如果每两个特征都要学一个参数，参数量会爆炸。而且推荐数据本身就很稀疏，大部分特征组合根本没有足够的样本来训练。</p>
<p>所以关键是要找到一种巧妙的方法，既能自动学习特征交叉，又不会让参数太多。解决了这个问题后，还得考虑怎么把这些学到的交叉特征和深度网络结合起来。</p>
<section id="fm">
<h2><span class="section-number">3.2.1.1. </span>FM: 从召回到精排的华丽转身<a class="headerlink" href="#fm" title="Permalink to this heading">¶</a></h2>
<p>还记得我们在召回章节 <a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html#fm-matching-model"><span class="std std-numref">2.2.2.1节</span></a>
遇到的FM吗？当时我们看到它如何巧妙地将用户和物品分解成向量，通过内积实现高效的双塔召回。现在到了精排阶段，FM又要展现它的另一面了。</p>
<p>在召回时，FM主要解决的是“如何快速从海量物品中找到候选集”的问题。但在精排阶段，我们面临的挑战完全不同：<strong>如何自动学习特征之间的交叉关系，而不用手工一个个去设计</strong>。</p>
<p>这时候FM的核心思想就派上用场了——<strong>给每个特征学一个向量表示，然后用向量内积来捕捉特征间的关系</strong>。听起来很简单对吧？但这个简单的想法解决了一个大问题：不管你有多少特征，不管特征组合有多复杂，都能用同一套方法来处理。最关键的是，参数数量不会爆炸式增长，这对于推荐系统这种特征超多的场景来说太重要了。</p>
<figure class="align-default" id="id9">
<span id="fm-model-structure"></span><a class="reference internal image-reference" href="../../_images/fm_model.png"><img alt="../../_images/fm_model.png" src="../../_images/fm_model.png" style="width: 400px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.1 </span><span class="caption-text">FM模型结构</span><a class="headerlink" href="#id9" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>为了捕捉特征间的交互关系，一个直接的想法是在线性模型的基础上增加所有特征的二阶组合项，即多项式模型：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-0">
<span class="eqno">(3.2.1)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-0" title="Permalink to this equation">¶</a></span>\[y = w_0 + \sum_{i=1}^n w_i x_i + \sum_{i=1}^{n-1} \sum_{j=i+1}^n w_{ij} x_i x_j\]</div>
<p>其中，<span class="math notranslate nohighlight">\(w_0\)</span> 是全局偏置项，<span class="math notranslate nohighlight">\(w_i\)</span> 是特征 <span class="math notranslate nohighlight">\(x_i\)</span>
的权重，<span class="math notranslate nohighlight">\(w_{ij}\)</span> 是特征 <span class="math notranslate nohighlight">\(x_i\)</span> 和 <span class="math notranslate nohighlight">\(x_j\)</span>
交互的权重，<span class="math notranslate nohighlight">\(n\)</span>
是特征数量。这个模型存在两个致命缺陷：第一，参数数量会达到
<span class="math notranslate nohighlight">\(O(n^2)\)</span>
的级别，在特征数量庞大的推荐场景下难以承受；第二，在数据高度稀疏的环境中，绝大多数的交叉特征
<span class="math notranslate nohighlight">\(x_i x_j\)</span> 因为在训练集中从未共同出现过，导致其对应的权重
<span class="math notranslate nohighlight">\(w_{ij}\)</span> 无法得到有效学习。</p>
<p>FM 模型巧妙地解决了这个问题。它将交互权重 <span class="math notranslate nohighlight">\(w_{ij}\)</span>
分解为两个低维隐向量的内积，即
<span class="math notranslate nohighlight">\(w_{ij}=\langle\mathbf{v}_i,\mathbf{v}_j\rangle\)</span>。这样，模型的预测公式就演变为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-1">
<span class="eqno">(3.2.2)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-1" title="Permalink to this equation">¶</a></span>\[y = w_0 + \sum_{i=1}^n w_i x_i + \sum_{i=1}^{n-1} \sum_{j=i+1}^n \langle \mathbf{v}_i, \mathbf{v}_j \rangle x_i x_j\]</div>
<p>其中<span class="math notranslate nohighlight">\(\mathbf{v}_i,\mathbf{v}_j\)</span> 分别是特征 <span class="math notranslate nohighlight">\(i\)</span> 和特征
<span class="math notranslate nohighlight">\(j\)</span> 的 <span class="math notranslate nohighlight">\(k\)</span> 维隐向量（Embedding）。<span class="math notranslate nohighlight">\(k\)</span>
是一个远小于特征数量 <span class="math notranslate nohighlight">\(n\)</span>
的超参数，<span class="math notranslate nohighlight">\(\langle \mathbf{v}_i,\mathbf{v}_j \rangle\)</span>
表示两个隐向量的内积，计算方式为
<span class="math notranslate nohighlight">\(\sum_{f=1}^k v_{i,f} \cdot v_{j,f}\)</span>。</p>
<p>这种<strong>参数共享</strong>的设计是 FM 的精髓所在。原本需要学习 <span class="math notranslate nohighlight">\(O(n^2)\)</span>
个独立的交叉权重 <span class="math notranslate nohighlight">\(w_{ij}\)</span>，现在只需要为每个特征学习一个
<span class="math notranslate nohighlight">\(k\)</span> 维的隐向量 <span class="math notranslate nohighlight">\(v_i\)</span>，总参数量就从 <span class="math notranslate nohighlight">\(O(n^2)\)</span> 降低到了
<span class="math notranslate nohighlight">\(O(nk)\)</span>。更重要的是，它极大地缓解了数据稀疏问题。即使特征
<span class="math notranslate nohighlight">\(i\)</span> 和 <span class="math notranslate nohighlight">\(j\)</span>
在训练样本中从未同时出现过，模型依然可以通过它们各自与其他特征（如
<span class="math notranslate nohighlight">\(k\)</span>）的共现数据，分别学到有效的隐向量 <span class="math notranslate nohighlight">\(v_i\)</span> 和
<span class="math notranslate nohighlight">\(v_j\)</span>。只要隐向量学习得足够好，模型就能够泛化并预测 <span class="math notranslate nohighlight">\(x_i\)</span>
和 <span class="math notranslate nohighlight">\(x_j\)</span> 的交叉效果。此外，通过巧妙的数学变换，FM
的二阶交叉项计算复杂度可以从 <span class="math notranslate nohighlight">\(O(kn^2)\)</span> 优化到线性的
<span class="math notranslate nohighlight">\(O(kn)\)</span>，<a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html#equation-eq-fm-cross">(2.2.12)</a> 使其在工业界得到了广泛应用。</p>
<p><strong>核心代码</strong></p>
<p>FM的核心在于将 <span class="math notranslate nohighlight">\(O(n^2)\)</span>
的二阶交叉项优化为线性复杂度。通过简单的代数变换，我们可以高效计算所有特征对的交互：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># FM层的核心计算：0.5 * ((sum(v))^2 - sum(v^2))</span>
<span class="c1"># inputs: [batch_size, field_num, embedding_size]</span>

<span class="c1"># 先求和再平方：(∑v_i)^2</span>
<span class="n">square_of_sum</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="p">)</span>  <span class="c1"># [B, 1, D]</span>

<span class="c1"># 先平方再求和：∑(v_i^2)</span>
<span class="n">sum_of_square</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span>
    <span class="n">inputs</span> <span class="o">*</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span>
<span class="p">)</span>  <span class="c1"># [B, 1, D]</span>

<span class="c1"># FM二阶交互项</span>
<span class="n">cross_term</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span>
    <span class="n">square_of_sum</span> <span class="o">-</span> <span class="n">sum_of_square</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">2</span>
<span class="p">)</span>  <span class="c1"># [B, 1]</span>
</pre></div>
</div>
<p>这个实现的巧妙之处在于，无论有多少特征，计算复杂度始终保持线性，使得FM能够处理推荐系统中常见的高维稀疏特征。</p>
</section>
<section id="afm">
<h2><span class="section-number">3.2.1.2. </span>AFM: 注意力加权的交叉特征<a class="headerlink" href="#afm" title="Permalink to this heading">¶</a></h2>
<p>FM
对所有特征交叉给予了相同的权重，但实际上不同交叉组合的重要性是不同的。AFM
<span id="id2">(<a class="reference internal" href="../../chapter_references/references.html#id48" title="Xiao, J., Ye, H., He, X., Zhang, H., Wu, F., &amp; Chua, T.-S. (2017). Attentional factorization machines: learning the weight of feature interactions via attention networks. arXiv preprint arXiv:1708.04617.">Xiao <em>et al.</em>, 2017</a>)</span>
在此基础上引入注意力机制，为不同的特征交叉分配权重，使模型能关注到更重要的交互。例如，在预测一位用户是否会点击一条体育新闻时，“用户年龄=18-24岁”与“新闻类别=体育”的交叉，其重要性显然要高于“用户年龄=18-24岁”与“新闻发布时间=周三”的交叉。</p>
<figure class="align-default" id="id10">
<span id="afm-model-structure"></span><a class="reference internal image-reference" href="../../_images/afm_architecture.png"><img alt="../../_images/afm_architecture.png" src="../../_images/afm_architecture.png" style="width: 500px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.2 </span><span class="caption-text">AFM模型结构</span><a class="headerlink" href="#id10" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>AFM 的模型结构在 FM
的基础上进行了扩展。它首先将所有成对特征的隐向量进行<strong>元素积（Hadamard
Product, 记为 :math:`odot` ）</strong>，而不是像 FM
那样直接求内积。这样做保留了交叉特征的向量信息，为后续的注意力计算提供了输入。这个步骤被称为成对交互层（Pair-wise
Interaction Layer）。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-2">
<span class="eqno">(3.2.3)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-2" title="Permalink to this equation">¶</a></span>\[f_{PI}(\mathcal{E}) = \{(v_i \odot v_j) x_i x_j \}_{(i,j) \in \mathcal{R}_x}\]</div>
<p>其中，<span class="math notranslate nohighlight">\(\mathcal{E}\)</span>
表示输入样本中所有非零特征的Embedding向量集合，<span class="math notranslate nohighlight">\(\mathcal{R}_x\)</span>
表示输入样本中所有非零特征的索引对集合。随后，模型引入一个注意力机制，来学习每个交叉特征
<span class="math notranslate nohighlight">\((v_i \odot v_j)\)</span> 的重要性得分 <span class="math notranslate nohighlight">\(a_{ij}\)</span>。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-3">
<span class="eqno">(3.2.4)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-3" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
a_{ij}' &amp;= \textbf{h}^T \text{ReLU}(\textbf{W} (\mathbf{v}_i \odot \mathbf{v}_j) x_i x_j + \textbf{b}) \\
a_{ij} &amp;= \frac{\exp(a_{ij}')}{\sum_{(i,k) \in \mathcal{R}_x} \exp(a_{ik}')}
\end{aligned}\end{split}\]</div>
<p>其中，<span class="math notranslate nohighlight">\(\textbf{W}\)</span> 是注意力网络的权重矩阵，<span class="math notranslate nohighlight">\(\textbf{b}\)</span>
是偏置向量，<span class="math notranslate nohighlight">\(\textbf{h}\)</span> 是输出层向量。这个得分 <span class="math notranslate nohighlight">\(a_{ij}\)</span>
经过 Softmax
归一化后，被用作加权求和的权重，与原始的交叉特征向量相乘，最终汇总成一个向量。这个过程被称为注意力池化层（Attention-based
Pooling）。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-4">
<span class="eqno">(3.2.5)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-4" title="Permalink to this equation">¶</a></span>\[f_{Att} = \sum_{(i,j) \in \mathcal{R}_x} a_{ij} (\mathbf{v}_i \odot \mathbf{v}_j) x_i x_j\]</div>
<p>最后，AFM
的完整预测公式由一阶线性部分和经过注意力加权的二阶交叉部分组成：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-5">
<span class="eqno">(3.2.6)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-5" title="Permalink to this equation">¶</a></span>\[\hat{y}_{afm}(x) = w_0 + \sum_{i=1}^n w_i x_i + \textbf{p}^T f_{Att}\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\textbf{p}\)</span>
是一个投影向量，用于将最终的交叉结果映射为标量。通过引入注意力机制，<strong>AFM
不仅提升了模型的表达能力，还通过可视化注意力权重</strong> <span class="math notranslate nohighlight">\(a_{ij}\)</span>
<strong>赋予了模型更好的可解释性</strong>，让我们可以洞察哪些特征交叉对预测结果的贡献最大。</p>
<p><strong>核心代码</strong></p>
<p>AFM的关键在于注意力池化层，它为每个特征交叉对分配不同的权重：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1. 计算所有特征对的元素积交互</span>
<span class="c1"># group_pairwise: [batch_size, num_pairs, embedding_dim]</span>
<span class="n">group_pairwise</span> <span class="o">=</span> <span class="n">pairwise_feature_interactions</span><span class="p">(</span>
    <span class="n">group_feature</span><span class="p">,</span> <span class="n">drop_rate</span><span class="o">=</span><span class="n">dropout_rate</span>
<span class="p">)</span>

<span class="c1"># 2. 注意力权重计算：h^T · ReLU(W · (v_i ⊙ v_j) + b)</span>
<span class="n">weighted_inputs</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span>
    <span class="n">group_pairwise</span><span class="p">,</span> <span class="n">attention_weight</span>
<span class="p">)</span> <span class="o">+</span> <span class="n">attention_bias</span>  <span class="c1"># [B, num_pairs, attention_factor]</span>

<span class="n">activation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">weighted_inputs</span><span class="p">)</span>
<span class="n">projected</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">activation</span><span class="p">,</span> <span class="n">attention_projection</span><span class="p">)</span>  <span class="c1"># [B, num_pairs, 1]</span>

<span class="c1"># 3. Softmax归一化得到注意力权重</span>
<span class="n">attention_weights</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">projected</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>

<span class="c1"># 4. 加权求和：∑ a_ij · (v_i ⊙ v_j)</span>
<span class="n">attention_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span>
    <span class="n">tf</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">group_pairwise</span><span class="p">,</span> <span class="n">attention_weights</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span>
<span class="p">)</span>  <span class="c1"># [B, D]</span>
</pre></div>
</div>
<p>相比FM对所有特征交叉一视同仁，AFM通过注意力机制自动识别重要的交互模式，提升了模型的表达能力和可解释性。</p>
</section>
<section id="nfm">
<h2><span class="section-number">3.2.1.3. </span>NFM: 交叉特征的深度学习<a class="headerlink" href="#nfm" title="Permalink to this heading">¶</a></h2>
<p>NFM <span id="id3">(<a class="reference internal" href="../../chapter_references/references.html#id47" title="He, X., &amp; Chua, T.-S. (2017). Neural factorization machines for sparse predictive analytics. Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 355–364).">He and Chua, 2017</a>)</span> 探索了如何更深入地利用交叉信息。它将 FM
的二阶交叉结果（用哈达玛积表示的向量）作为输入，送入一个深度神经网络（DNN），从而在
FM 的基础上学习更高阶、更复杂的非线性关系。NFM 的核心思想是，<strong>FM
所捕获的二阶交叉信息本身就是一种非常有价值的特征，可以作为“原料”输入给强大的
DNN，由 DNN 来自动学习这些交叉特征之间的高阶组合关系</strong>。</p>
<p>NFM
的结构可以分为两个部分：先做特征交叉，再用深度网络学习。它的关键创新是引入了一个“特征交叉池化层”（Bi-Interaction
Pooling
Layer），这一层的作用很直接——把所有特征对的交叉信息汇总成一个向量，然后送给后面的神经网络去学习更复杂的模式。具体的计算过程如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-6">
<span class="eqno">(3.2.7)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-6" title="Permalink to this equation">¶</a></span>\[f_{BI}(V_x) = \sum_{i=1}^n \sum_{j=i+1}^n x_i \mathbf{v}_i \odot x_j \mathbf{v}_j\]</div>
<p>其中 <span class="math notranslate nohighlight">\(V_x = \{x_1 v_1, x_2 v_2, ..., x_n v_n\}\)</span>
是输入样本中所有非零特征的 Embedding 向量集合，<span class="math notranslate nohighlight">\(\odot\)</span>
仍然是元素积操作。这个操作的结果是一个与 Embedding
维度相同的向量，它有效地编码了所有的二阶特征交叉信息。值得注意的是，与FM中的变换类似，这一层的计算同样可以被优化到线性时间复杂度，非常高效：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-7">
<span class="eqno">(3.2.8)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-7" title="Permalink to this equation">¶</a></span>\[f_{BI}(V_x) = \frac{1}{2} \left[\left(\sum_{i=1}^n x_i \mathbf{v}_i\right)^2 - \sum_{i=1}^n (x_i \mathbf{v}_i)^2\right].\]</div>
<p>得到特征交叉池化层的输出向量 <span class="math notranslate nohighlight">\(f_{BI}(V_x)\)</span> 后，NFM
将其送入一个标准的多层前馈神经网络（MLP）：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-8">
<span class="eqno">(3.2.9)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-8" title="Permalink to this equation">¶</a></span>\[z_1 = \sigma_1(\textbf{W}_1 f_{BI}(V_x) + \textbf{b}_1),\ \ldots,\ z_L = \sigma_L(\textbf{W}_L z_{L-1} + \textbf{b}_L)\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\textbf{W}_l, \textbf{b}_l, \sigma_l\)</span> 分别是第 <span class="math notranslate nohighlight">\(l\)</span>
个隐藏层的权重、偏置和非线性激活函数。最后，NFM 将一阶线性部分与 DNN
部分的输出结合起来，得到最终的预测结果：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-9">
<span class="eqno">(3.2.10)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-9" title="Permalink to this equation">¶</a></span>\[\hat{y}_{NFM}(x) = w_0 + \sum_{i=1}^n w_i x_i + \textbf{h}^T z_L\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\textbf{h}\)</span> 是预测层的权重向量。通过这种方式，NFM 巧妙地将
FM 的二阶交叉能力与 DNN 的高阶非线性建模能力结合在了一起。FM
可以被看作是 NFM 在没有隐藏层时的特例，这表明 NFM 是对 FM
的一个自然扩展和深度化。</p>
<p><strong>核心代码</strong></p>
<p>NFM的双交互池化层将所有特征对的交叉信息压缩为一个固定维度的向量，作为DNN的输入：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 双交互池化层：1/2 * ((∑v_i)^2 - ∑(v_i^2))</span>
<span class="c1"># inputs: [batch_size, num_features, embedding_dim]</span>

<span class="c1"># (∑v_i)^2：先求和再平方</span>
<span class="n">sum_of_embeds</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, D]</span>
<span class="n">square_of_sum</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">sum_of_embeds</span><span class="p">)</span>  <span class="c1"># [B, D]</span>

<span class="c1"># ∑(v_i^2)：先平方再求和</span>
<span class="n">square_of_embeds</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>  <span class="c1"># [B, N, D]</span>
<span class="n">sum_of_square</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">square_of_embeds</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, D]</span>

<span class="c1"># 双交互池化输出</span>
<span class="n">bi_interaction</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="n">square_of_sum</span> <span class="o">-</span> <span class="n">sum_of_square</span><span class="p">)</span>  <span class="c1"># [B, D]</span>

<span class="c1"># 送入深度神经网络</span>
<span class="n">dnn_output</span> <span class="o">=</span> <span class="n">DNNs</span><span class="p">(</span>
    <span class="n">units</span><span class="o">=</span><span class="p">[</span><span class="mi">64</span><span class="p">,</span> <span class="mi">32</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">,</span> <span class="n">use_bn</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">dropout_rate</span><span class="o">=</span><span class="mf">0.1</span>
<span class="p">)(</span><span class="n">bi_interaction</span><span class="p">)</span>
</pre></div>
</div>
<p>NFM的关键创新在于将FM的二阶交叉信息作为DNN的输入，使得模型既能捕捉特征间的交互，又能学习高阶非线性关系。</p>
</section>
<section id="pnn">
<h2><span class="section-number">3.2.1.4. </span>PNN: 多样化的乘积操作<a class="headerlink" href="#pnn" title="Permalink to this heading">¶</a></h2>
<p>PNN <span id="id4">(<a class="reference internal" href="../../chapter_references/references.html#id41" title="Qu, Y., Cai, H., Ren, K., Zhang, W., Yu, Y., Wen, Y., &amp; Wang, J. (2016). Product-based neural networks for user response prediction. 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1149–1154).">Qu <em>et al.</em>, 2016</a>)</span> 的想法很直接：既然内积（Inner
Product）和元素积（Hadamard
Product）都有各自的局限性，为什么不把它们结合起来呢？<strong>PNN
在内积的基础上，又引入了外积（Outer
Product），希望通过多种乘积操作来更全面地捕捉特征间的交互关系</strong>。它的关键组件是“乘积层”（Product
Layer），这一层会对特征 Embedding
做各种乘积运算，然后把结果传给后面的全连接网络。</p>
<figure class="align-default" id="id11">
<span id="pnn-model-structure"></span><a class="reference internal image-reference" href="../../_images/pnn.png"><img alt="../../_images/pnn.png" src="../../_images/pnn.png" style="width: 400px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.3 </span><span class="caption-text">PNN模型结构</span><a class="headerlink" href="#id11" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>PNN 的乘积层会产生两部分信号，一部分是线性信号
<span class="math notranslate nohighlight">\(\mathbf{l}_z\)</span>，直接来自于各特征的 Embedding 向量，定义为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-10">
<span class="eqno">(3.2.11)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-10" title="Permalink to this equation">¶</a></span>\[\mathbf{l}_z^n = \sum_{i=1}^N\sum_{k=1}^M (\mathbf{W}_z^n)_{i,k} \mathbf{f}_i^k\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\mathbf{f}_i\)</span> 是特征的 Embedding
向量，<span class="math notranslate nohighlight">\(\mathbf{W}_z^n\)</span> 是第 <span class="math notranslate nohighlight">\(n\)</span>
个神经元对应的线性信号权重矩阵。<span class="math notranslate nohighlight">\(N\)</span> 为特征字段数量，<span class="math notranslate nohighlight">\(M\)</span>
为 Embedding 维数。</p>
<p>另一部分是二次信号 <span class="math notranslate nohighlight">\(\mathbf{l}_p\)</span>，来自于特征 Embedding
之间的两两交互。根据交互方式的不同，PNN 的二次信号分为两种主要的变体：</p>
<p><strong>IPNN (Inner Product-based Neural Network)</strong>: 这种变体使用特征
Embedding 之间的<strong>内积</strong>来计算二次信号。一个直接的计算方式是：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-11">
<span class="eqno">(3.2.12)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-11" title="Permalink to this equation">¶</a></span>\[\mathbf{l}_p^n = \sum_{i=1}^N \sum_{j=1}^N (\textbf{W}_p^n)_{i,j} \langle \mathbf{f}_i, \mathbf{f}_j \rangle\]</div>
<p><span class="math notranslate nohighlight">\(\textbf{W}_p^n\)</span> 是第 <span class="math notranslate nohighlight">\(n\)</span>
个神经元对应的权重矩阵。这种计算方式的复杂度是
<span class="math notranslate nohighlight">\(O(N^2)\)</span>，<span class="math notranslate nohighlight">\(N\)</span> 为特征字段数量，开销巨大。为了优化，PNN
引入了矩阵分解技巧，将权重矩阵 <span class="math notranslate nohighlight">\(\textbf{W}_p^n\)</span> 分解为
<span class="math notranslate nohighlight">\(\theta_n \theta_n^T\)</span>，即
<span class="math notranslate nohighlight">\((\textbf{W}_p^n)_{i,j} = \theta_{i,n} \theta_{j,n}\)</span>。于是，计算过程可以被重写和简化：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-12">
<span class="eqno">(3.2.13)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-12" title="Permalink to this equation">¶</a></span>\[\mathbf{l}_p^n = \sum_{i=1}^N \sum_{j=1}^N \theta_i^n \theta_j^n \langle \mathbf{f}_i, \mathbf{f}_j \rangle = \sum_{i=1}^N \sum_{j=1}^N \langle \theta_i^n \mathbf{f}_i, \theta_j^n \mathbf{f}_j \rangle = \langle \sum_{i=1}^N \theta_i^n \mathbf{f}_i, \sum_{j=1}^N \theta_j^n \mathbf{f}_j \rangle = \left\|\sum_{i=1}^N \theta_i^n \mathbf{f}_i\right\|^2\]</div>
<p>通过这个变换，<strong>计算所有内积对的加权和，转变成了先对 Embedding
进行加权求和，然后计算一次向量的 L2 范数平方</strong>，复杂度成功地从
<span class="math notranslate nohighlight">\(O(N^2M)\)</span> 降低到了 <span class="math notranslate nohighlight">\(O(NM)\)</span>。</p>
<p>优化后的完整计算公式为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-13">
<span class="eqno">(3.2.14)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-13" title="Permalink to this equation">¶</a></span>\[\mathbf{l}_p = \left(\left\|\sum_{i=1}^N \theta_i^1 \mathbf{f}_i\right\|^2, \left\|\sum_{i=1}^N \theta_i^2 \mathbf{f}_i\right\|^2, \ldots, \left\|\sum_{i=1}^N \theta_i^n \mathbf{f}_i\right\|^2\right)\]</div>
<p><strong>OPNN (Outer Product-based Neural Network)</strong>: 这种变体使用特征
Embedding 之间的<strong>外积</strong> <span class="math notranslate nohighlight">\(\mathbf{f}_i\mathbf{f}_j^T\)</span>
来捕捉更丰富的交互信息。外积会产生一个矩阵，如果对所有外积对进行加权求和
<span class="math notranslate nohighlight">\(\sum_{i=1}^N \sum_{j=1}^N \mathbf{f}_i \mathbf{f}_j^T\)</span>，计算复杂度会达到
<span class="math notranslate nohighlight">\(O(N^2M^2)\)</span>（<span class="math notranslate nohighlight">\(M\)</span> 为 Embedding
维数），这在实践中是不可行的。OPNN
采用了一种称为“叠加”（superposition）的近似方法来大幅降低复杂度。它不再计算所有成对的外积，而是<strong>先将所有特征的
Embedding 向量相加，然后再计算一次外积</strong>：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-14">
<span class="eqno">(3.2.15)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-14" title="Permalink to this equation">¶</a></span>\[\sum_{i=1}^N \sum_{j=1}^N \mathbf{f}_i \mathbf{f}_j^T = (\sum_{i=1}^N \mathbf{f}_i)(\sum_{j=1}^N \mathbf{f}_j)^T\]</div>
<p>这样，计算量得到了节省 <span class="math notranslate nohighlight">\(O(M(M+N))\)</span>。优化后的完整计算公式为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-15">
<span class="eqno">(3.2.16)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-15" title="Permalink to this equation">¶</a></span>\[\mathbf{l}_p = \left(\langle\mathbf{W}_p^1, (\sum_{i=1}^N \mathbf{f}_i)(\sum_{j=1}^N \mathbf{f}_j)^T\rangle, \langle\mathbf{W}_p^2, (\sum_{i=1}^N \mathbf{f}_i)(\sum_{j=1}^N \mathbf{f}_j)^T\rangle, \ldots, \langle\mathbf{W}_p^n, (\sum_{i=1}^N \mathbf{f}_i)(\sum_{j=1}^N \mathbf{f}_j)^T\rangle\right)\]</div>
<p>其中 对称矩阵<span class="math notranslate nohighlight">\(\mathbf{W}_p^n \in \mathbb{R}^{M \times M}\)</span> 是第
<span class="math notranslate nohighlight">\(n\)</span>
个神经元对应的权重矩阵，矩阵内积<span class="math notranslate nohighlight">\(\langle \mathbf{A}, \mathbf{B} \rangle = \sum_{i=1}^M \sum_{j=1}^M \mathbf{A}_{i,j} \mathbf{B}_{i,j}\)</span>。</p>
<p>在得到线性信号 <span class="math notranslate nohighlight">\(l_z\)</span> 和经过优化的二次信号 <span class="math notranslate nohighlight">\(l_p\)</span> 后，PNN
将它们合并，并送入后续的全连接层进行高阶非线性变换：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-16">
<span class="eqno">(3.2.17)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-16" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\mathbf{l}_1 &amp;= \text{ReLU}(\mathbf{l}_z + \mathbf{l}_p + \mathbf{b}_1) \\
\mathbf{l}_2 &amp;= \text{ReLU}(\mathbf{W}_2 \mathbf{l}_1 + \mathbf{b}_2) \\
\hat{y} &amp;= \sigma(\textbf{W}_3 \mathbf{l}_2 + b_3)
\end{aligned}\end{split}\]</div>
<p>PNN
的独特之处在于，它将“乘积”操作（无论是内积还是外积）作为了网络中的一个核心计算单元，认为这种操作比传统
DNN 中简单的“加法”操作更能有效地捕捉类别型特征之间的交互关系。</p>
<p><strong>核心代码</strong></p>
<p>PNN通过内积和外积两种方式计算特征交互。以IPNN的优化实现为例：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 线性信号：直接对特征embedding做全连接</span>
<span class="n">concat_embed</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">concat</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, N*D]</span>
<span class="n">lz</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">concat_embed</span><span class="p">,</span> <span class="n">linear_w</span><span class="p">)</span>  <span class="c1"># [B, units]</span>

<span class="c1"># 内积优化：||∑(θ_i · f_i)||^2 代替 ∑∑&lt;θ_i·f_i, θ_j·f_j&gt;</span>
<span class="n">lp_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">units</span><span class="p">):</span>
    <span class="c1"># 对每个特征加权：θ_i · f_i</span>
    <span class="n">delta</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span>
        <span class="n">concat_embed</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">inner_w</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">)</span>  <span class="c1"># [B, N, D]</span>

    <span class="c1"># 求和后计算L2范数平方：||∑(θ_i · f_i)||^2</span>
    <span class="n">delta</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">delta</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, D]</span>
    <span class="n">lp_i</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">delta</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="c1"># [B, 1]</span>
    <span class="n">lp_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">lp_i</span><span class="p">)</span>

<span class="c1"># 拼接线性信号和内积信号</span>
<span class="n">lp</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">concat</span><span class="p">(</span><span class="n">lp_list</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, units]</span>
<span class="n">product_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">lz</span><span class="p">,</span> <span class="n">lp</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, 2*units]</span>
</pre></div>
</div>
<p>通过矩阵分解技巧，PNN将内积计算从 <span class="math notranslate nohighlight">\(O(N^2M)\)</span> 优化到
<span class="math notranslate nohighlight">\(O(NM)\)</span>，使得模型能够高效处理大规模特征交互。</p>
</section>
<section id="fibinet">
<h2><span class="section-number">3.2.1.5. </span>FiBiNET: 特征重要性与双线性交互<a class="headerlink" href="#fibinet" title="Permalink to this heading">¶</a></h2>
<p>PNN
用了多种乘积操作来做特征交互，但它把所有特征都当作同等重要。实际上，在推荐系统中，不同特征的重要性是不一样的。FiBiNET
(Feature Importance and Bilinear feature Interaction Network)
<span id="id5">(<a class="reference internal" href="../../chapter_references/references.html#id22" title="Huang, T., Zhang, Z., &amp; Zhang, J. (2019). Fibinet: combining feature importance and bilinear feature interaction for click-through rate prediction. Proceedings of the 13th ACM conference on recommender systems (pp. 169–177).">Huang <em>et al.</em>, 2019</a>)</span>
针对这个问题，<strong>先学习每个特征的重要性权重，再根据权重来做特征交互，最后通过双线性交互来建模特征关系</strong>。这样模型可以更有针对性地处理重要特征。</p>
<figure class="align-default" id="id12">
<span id="fibinet-architecture"></span><a class="reference internal image-reference" href="../../_images/fibinet_architecture.png"><img alt="../../_images/fibinet_architecture.png" src="../../_images/fibinet_architecture.png" style="width: 500px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.4 </span><span class="caption-text">FiBiNET模型结构</span><a class="headerlink" href="#id12" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>FiBiNET 的创新主要体现在两个核心模块上：<strong>SENET
特征重要性学习机制</strong>和<strong>双线性交互层</strong>。</p>
<p><strong>SENET 特征重要性学习</strong></p>
<p>FiBiNET 引入了来自计算机视觉领域的 <strong>SENET (Squeeze-and-Excitation
Network)</strong> <span id="id6">(<a class="reference internal" href="../../chapter_references/references.html#id21" title="Hu, J., Shen, L., &amp; Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).">Hu <em>et al.</em>, 2018</a>)</span>
机制，用于动态学习每个特征的重要性权重。与传统方法对所有特征一视同仁不同，SENET
能够自适应地为不同特征分配不同的权重，让模型更加关注那些对预测任务更重要的特征。</p>
<figure class="align-default" id="id13">
<span id="fibinet-senet-structure"></span><a class="reference internal image-reference" href="../../_images/fibinet_senet.png"><img alt="../../_images/fibinet_senet.png" src="../../_images/fibinet_senet.png" style="width: 400px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.5 </span><span class="caption-text">SENET层结构详解</span><a class="headerlink" href="#id13" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>SENET 的工作流程分为三个步骤：</p>
<ol class="arabic">
<li><p><strong>Squeeze (压缩)</strong>: 把每个特征的 <span class="math notranslate nohighlight">\(k\)</span> 维向量
<span class="math notranslate nohighlight">\(\mathbf{e}_i\)</span> 压缩成一个数值，方法是计算向量的平均值：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-17">
<span class="eqno">(3.2.18)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-17" title="Permalink to this equation">¶</a></span>\[\mathbf{z}_i = F_{\text{sq}}(\mathbf{e}_i) = \frac{1}{k} \sum_{t=1}^k \mathbf{e}_i(t)\]</div>
</li>
<li><p><strong>Excitation (激活)</strong>:
用两层神经网络来学习特征之间的关系，输出每个特征的重要性分数：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-18">
<span class="eqno">(3.2.19)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-18" title="Permalink to this equation">¶</a></span>\[\mathbf{A} = F_{\text{ex}}(\mathbf{Z}) = \sigma_2(\mathbf{W}_2 \sigma_1(\mathbf{W}_1 \mathbf{Z}))\]</div>
<p>这里 <span class="math notranslate nohighlight">\(\mathbf{W}_1 \in \mathbb{R}^{f \times \frac{f}{r}}\)</span> 和
<span class="math notranslate nohighlight">\(\mathbf{W}_2 \in \mathbb{R}^{\frac{f}{r} \times f}\)</span>
是网络的权重，<span class="math notranslate nohighlight">\(r\)</span> 是控制网络大小的参数。</p>
</li>
<li><p><strong>Re-weight (重新加权)</strong>: 用学到的重要性分数来调整原始特征向量：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-19">
<span class="eqno">(3.2.20)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-19" title="Permalink to this equation">¶</a></span>\[\mathbf{V} = F_{\text{ReWeight}}(\mathbf{A}, \mathbf{E}) = [\mathbf{a}_1 \cdot \mathbf{e}_1, \mathbf{a}_2 \cdot \mathbf{e}_2, \ldots, \mathbf{a}_f \cdot \mathbf{e}_f]\]</div>
</li>
</ol>
<p><strong>双线性交互层</strong></p>
<p>有了原始嵌入 <span class="math notranslate nohighlight">\(\mathbf{E}\)</span> 和 SENET 加权后的嵌入
<span class="math notranslate nohighlight">\(\mathbf{V}\)</span>，FiBiNET
接下来要解决如何更好地建模特征交互的问题。传统的 FM 使用内积，PNN
使用元素积，而 FiBiNET
采用了双线性交互的方式。它引入一个可学习的变换矩阵
<span class="math notranslate nohighlight">\(\mathbf{W} \in \mathbb{R}^{k \times k}\)</span>：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-20">
<span class="eqno">(3.2.21)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-20" title="Permalink to this equation">¶</a></span>\[\mathbf{p}_{ij} = \mathbf{v}_i \cdot \mathbf{W} \circ \mathbf{v}_j\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\circ\)</span> 表示哈达玛积。这个变换矩阵 <span class="math notranslate nohighlight">\(\mathbf{W}\)</span>
的作用是在计算特征交互前，先对其中一个特征向量进行线性变换，从而打破了传统方法中特征向量各维度对称交互的限制。</p>
<p>FiBiNET 分别对原始嵌入 <span class="math notranslate nohighlight">\(\mathbf{E}\)</span> 和加权嵌入 <span class="math notranslate nohighlight">\(\mathbf{V}\)</span>
进行双线性交互，再将两组交互结果与原始特征、深度网络输出一起送入最终的预测层。这样，<strong>FiBiNET
既考虑了特征重要性，又增强了特征交互的表达能力</strong>。</p>
<p><strong>核心代码</strong></p>
<p>FiBiNET的实现包含两个关键部分：SENET特征重要性学习和双线性交互。</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1. SENET特征重要性学习</span>
<span class="c1"># inputs: [batch_size, num_features, embedding_dim]</span>

<span class="c1"># Squeeze：全局平均池化</span>
<span class="n">squeeze</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_mean</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [B, N]</span>

<span class="c1"># Excitation：两层全连接网络</span>
<span class="n">excitation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">squeeze</span><span class="p">,</span> <span class="n">w1</span><span class="p">)</span>  <span class="c1"># [B, reduction_size]</span>
<span class="n">excitation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">excitation</span><span class="p">)</span>
<span class="n">excitation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">excitation</span><span class="p">,</span> <span class="n">w2</span><span class="p">)</span>  <span class="c1"># [B, N]</span>
<span class="n">excitation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">excitation</span><span class="p">)</span>

<span class="c1"># Re-weight：应用注意力权重</span>
<span class="n">excitation</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">excitation</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>  <span class="c1"># [B, N, 1]</span>
<span class="n">senet_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">excitation</span><span class="p">)</span>  <span class="c1"># [B, N, D]</span>

<span class="c1"># 2. 双线性交互：v_i · W ⊙ v_j</span>
<span class="n">interaction_outputs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_features</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">num_features</span><span class="p">):</span>
        <span class="c1"># 对特征i应用变换矩阵</span>
        <span class="n">vi_transformed</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">inputs</span><span class="p">[:,</span> <span class="n">i</span><span class="p">,</span> <span class="p">:],</span> <span class="n">W_list</span><span class="p">[</span><span class="n">idx</span><span class="p">])</span>  <span class="c1"># [B, D]</span>
        <span class="c1"># 与特征j做元素积</span>
        <span class="n">interaction</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">vi_transformed</span><span class="p">,</span> <span class="n">inputs</span><span class="p">[:,</span> <span class="n">j</span><span class="p">,</span> <span class="p">:])</span>  <span class="c1"># [B, D]</span>
        <span class="n">interaction_outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">interaction</span><span class="p">)</span>
</pre></div>
</div>
<p>FiBiNET通过SENET动态调整特征重要性，通过双线性变换增强特征交互的表达能力，相比传统方法更加灵活和高效。</p>
</section>
<section id="deepfm">
<span id="id7"></span><h2><span class="section-number">3.2.1.6. </span>DeepFM: 低阶高阶的统一建模<a class="headerlink" href="#deepfm" title="Permalink to this heading">¶</a></h2>
<p>DeepFM <span id="id8">(<a class="reference internal" href="../../chapter_references/references.html#id50" title="Guo, H., Tang, R., Ye, Y., Li, Z., &amp; He, X. (2017). Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247.">Guo <em>et al.</em>, 2017</a>)</span> 是对 Wide &amp; Deep
架构的直接改进和优化。它将 Wide &amp; Deep 中需要大量人工特征工程的 Wide
部分，<strong>直接替换为了一个无需任何人工干预的 FM
模型</strong>，从而实现了真正的端到端训练。更关键的是，DeepFM 中的 <strong>FM
组件和 Deep
组件共享同一份特征嵌入（Embedding）</strong>，这带来了两大好处：首先，模型可以同时从原始特征中学习低阶和高阶的特征交互；其次，共享
Embedding 的方式使得模型训练更加高效。</p>
<figure class="align-default" id="id14">
<span id="deepfm-architecture"></span><a class="reference internal image-reference" href="../../_images/deepfm_architecture.png"><img alt="../../_images/deepfm_architecture.png" src="../../_images/deepfm_architecture.png" style="width: 400px;" /></a>
<figcaption>
<p><span class="caption-number">图3.2.6 </span><span class="caption-text">DeepFM模型结构</span><a class="headerlink" href="#id14" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>DeepFM 的结构非常清晰，它由 FM 和 DNN 两个并行的组件构成，两者共享输入。</p>
<ul class="simple">
<li><p><strong>FM 组件</strong>: 负责学习一阶特征和二阶特征交叉。其输出 <span class="math notranslate nohighlight">\(y_{FM}\)</span>
的计算方式与标准 FM 完全相同：</p></li>
</ul>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-21">
<span class="eqno">(3.2.22)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-21" title="Permalink to this equation">¶</a></span>\[y_{FM} = \langle w, x \rangle + \sum_{i=1}^d \sum_{j=i+1}^d \langle V_{i}, V_{j} \rangle x_{i}x_{j}\]</div>
<p>这里的 <span class="math notranslate nohighlight">\(V_{i}\)</span> 就是特征 <span class="math notranslate nohighlight">\(i\)</span> 的 Embedding 向量。</p>
<ul class="simple">
<li><p><strong>Deep 组件</strong>: 负责学习高阶的非线性特征交叉。它的输入正是 FM
组件中所使用的那一套 Embedding
向量。具体来说，所有输入特征首先被映射到它们的低维 Embedding
向量上，然后这些 Embedding 向量被拼接（Concatenate）在一起，作为 DNN
的输入。</p></li>
</ul>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-22">
<span class="eqno">(3.2.23)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-22" title="Permalink to this equation">¶</a></span>\[a^{(0)} = [e_1, e_2, ..., e_m]\]</div>
<p>其中 <span class="math notranslate nohighlight">\(e_i\)</span> 是第 <span class="math notranslate nohighlight">\(i\)</span> 个特征字段的 Embedding
向量。这个拼接后的向量随后被送入一个标准的前馈神经网络，前向传播公式为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-23">
<span class="eqno">(3.2.24)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-23" title="Permalink to this equation">¶</a></span>\[a^{(l+1)} = \sigma(\textbf{W}^{(l)} a^{(l)} + \textbf{b}^{(l)})\]</div>
<p>其中 <span class="math notranslate nohighlight">\(l\)</span> 是层深度，<span class="math notranslate nohighlight">\(\sigma\)</span>
是激活函数，<span class="math notranslate nohighlight">\(\textbf{W}^{(l)}\)</span>、<span class="math notranslate nohighlight">\(\textbf{b}^{(l)}\)</span>分别是第
<span class="math notranslate nohighlight">\(l\)</span> 层的权重和偏置。最后输出为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-24">
<span class="eqno">(3.2.25)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-24" title="Permalink to this equation">¶</a></span>\[y_{Deep} = \textbf{W}^{|H|+1} \cdot a^{|H|} + \textbf{b}^{|H|+1}\]</div>
<p>其中 <span class="math notranslate nohighlight">\(H\)</span> 是隐藏层数量.</p>
<p>最终，DeepFM 把 FM 部分和 Deep 部分的输出的 Logits 直接相加，再通过
Sigmoid 函数得到点击率预测：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-2-feature-crossing-1-second-order-25">
<span class="eqno">(3.2.26)<a class="headerlink" href="#equation-chapter-2-ranking-2-feature-crossing-1-second-order-25" title="Permalink to this equation">¶</a></span>\[\hat{y} = \sigma(y_{FM} + y_{Deep})\]</div>
<p>DeepFM 的核心思路很简单：用 FM
处理低阶特征交互，用深度网络处理高阶特征交互，两者共享同一套
Embedding。这样做的好处是减少了人工特征工程的工作量，模型可以自动学习各种特征交互模式。相比
Wide &amp; Deep 需要专家手工构造 Wide 部分的特征，DeepFM 用 FM
替代了这部分工作。</p>
<p><strong>核心代码</strong></p>
<p>DeepFM的关键在于FM和DNN两个组件共享同一套Embedding，各自负责不同层次的特征交互：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 获取共享的特征embedding</span>
<span class="c1"># concat_feature: [batch_size, num_features, embedding_dim]</span>
<span class="n">concat_feature</span> <span class="o">=</span> <span class="n">concat_group_embedding</span><span class="p">(</span>
    <span class="n">group_embedding_feature_dict</span><span class="p">,</span> <span class="n">group_name</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">flatten</span><span class="o">=</span><span class="kc">False</span>
<span class="p">)</span>

<span class="c1"># 1. FM组件：学习二阶特征交叉</span>
<span class="n">fm_output</span> <span class="o">=</span> <span class="n">FM</span><span class="p">()(</span><span class="n">concat_feature</span><span class="p">)</span>  <span class="c1"># [B, 1]</span>

<span class="c1"># 2. DNN组件：学习高阶非线性特征交叉</span>
<span class="c1"># 将embedding展平作为DNN输入</span>
<span class="n">flatten_feature</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">()(</span><span class="n">concat_feature</span><span class="p">)</span>  <span class="c1"># [B, N*D]</span>
<span class="n">dnn_output</span> <span class="o">=</span> <span class="n">DNNs</span><span class="p">(</span>
    <span class="n">units</span><span class="o">=</span><span class="p">[</span><span class="mi">64</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>  <span class="c1"># 多层神经网络</span>
    <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">,</span>
    <span class="n">dropout_rate</span><span class="o">=</span><span class="mf">0.1</span>
<span class="p">)(</span><span class="n">flatten_feature</span><span class="p">)</span>  <span class="c1"># [B, 1]</span>

<span class="c1"># 3. 联合训练：将FM和DNN的输出相加</span>
<span class="n">deepfm_logits</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">fm_output</span><span class="p">,</span> <span class="n">dnn_output</span><span class="p">)</span>  <span class="c1"># [B, 1]</span>
<span class="n">output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;sigmoid&quot;</span><span class="p">)(</span><span class="n">deepfm_logits</span><span class="p">)</span>
</pre></div>
</div>
<p>DeepFM通过共享Embedding实现了端到端训练，FM组件捕捉低阶交叉，DNN组件学习高阶模式，两者互补形成高效的特征学习能力。</p>
<p><strong>代码实践</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">funrec</span><span class="w"> </span><span class="kn">import</span> <span class="n">compare_models</span>

<span class="n">compare_models</span><span class="p">([</span><span class="s1">&#39;fm&#39;</span><span class="p">,</span> <span class="s1">&#39;afm&#39;</span><span class="p">,</span> <span class="s1">&#39;nfm&#39;</span><span class="p">,</span> <span class="s1">&#39;pnn&#39;</span><span class="p">,</span> <span class="s1">&#39;fibinet&#39;</span><span class="p">,</span> <span class="s1">&#39;deepfm&#39;</span><span class="p">])</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">模型</span>    <span class="o">|</span>    <span class="n">auc</span> <span class="o">|</span>   <span class="n">gauc</span> <span class="o">|</span>   <span class="n">val_user</span> <span class="o">|</span>
<span class="o">+=========+========+========+============+</span>
<span class="o">|</span> <span class="n">fm</span>      <span class="o">|</span> <span class="mf">0.5888</span> <span class="o">|</span> <span class="mf">0.5707</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">afm</span>     <span class="o">|</span> <span class="mf">0.5926</span> <span class="o">|</span> <span class="mf">0.5659</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">nfm</span>     <span class="o">|</span> <span class="mf">0.5855</span> <span class="o">|</span> <span class="mf">0.5584</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">pnn</span>     <span class="o">|</span> <span class="mf">0.5953</span> <span class="o">|</span> <span class="mf">0.5692</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">fibinet</span> <span class="o">|</span> <span class="mf">0.5965</span> <span class="o">|</span> <span class="mf">0.5729</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
<span class="o">|</span> <span class="n">deepfm</span>  <span class="o">|</span> <span class="mf">0.6027</span> <span class="o">|</span> <span class="mf">0.5728</span> <span class="o">|</span>        <span class="mi">928</span> <span class="o">|</span>
<span class="o">+---------+--------+--------+------------+</span>
</pre></div>
</div>
</section>
</section>


        </div>
        <div class="side-doc-outline">
            <div class="side-doc-outline--content"> 
<div class="localtoc">
    <p class="caption">
      <span class="caption-text">Table Of Contents</span>
    </p>
    <ul>
<li><a class="reference internal" href="#">3.2.1. 二阶特征交叉</a><ul>
<li><a class="reference internal" href="#fm">3.2.1.1. FM: 从召回到精排的华丽转身</a></li>
<li><a class="reference internal" href="#afm">3.2.1.2. AFM: 注意力加权的交叉特征</a></li>
<li><a class="reference internal" href="#nfm">3.2.1.3. NFM: 交叉特征的深度学习</a></li>
<li><a class="reference internal" href="#pnn">3.2.1.4. PNN: 多样化的乘积操作</a></li>
<li><a class="reference internal" href="#fibinet">3.2.1.5. FiBiNET: 特征重要性与双线性交互</a></li>
<li><a class="reference internal" href="#deepfm">3.2.1.6. DeepFM: 低阶高阶的统一建模</a></li>
</ul>
</li>
</ul>

</div>
            </div>
        </div>

      <div class="clearer"></div>
    </div><div class="pagenation">
     <a id="button-prev" href="index.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="P">
         <i class="pagenation-arrow-L fas fa-arrow-left fa-lg"></i>
         <div class="pagenation-text">
            <span class="pagenation-direction">Previous</span>
            <div>3.2. 特征交叉</div>
         </div>
     </a>
     <a id="button-next" href="2.higher_order.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="N">
         <i class="pagenation-arrow-R fas fa-arrow-right fa-lg"></i>
        <div class="pagenation-text">
            <span class="pagenation-direction">Next</span>
            <div>3.2.2. 高阶特征交叉</div>
        </div>
     </a>
  </div>
        
        </main>
    </div>
  </body>
</html>